Picture for Sherzod Hakimov

Sherzod Hakimov

clem:todd: A Framework for the Systematic Benchmarking of LLM-Based Task-Oriented Dialogue System Realisations

Add code
May 08, 2025
Viaarxiv icon

Playpen: An Environment for Exploring Learning Through Conversational Interaction

Add code
Apr 11, 2025
Viaarxiv icon

Plant in Cupboard, Orange on Table, Book on Shelf. Benchmarking Practical Reasoning and Situation Modelling in a Text-Simulated Situated Environment

Add code
Feb 17, 2025
Viaarxiv icon

Ad-hoc Concept Forming in the Game Codenames as a Means for Evaluating Large Language Models

Add code
Feb 17, 2025
Viaarxiv icon

Towards No-Code Programming of Cobots: Experiments with Code Synthesis by Large Code Models for Conversational Programming

Add code
Sep 18, 2024
Viaarxiv icon

Free-text Rationale Generation under Readability Level Control

Add code
Jul 01, 2024
Viaarxiv icon

How Many Parameters Does it Take to Change a Light Bulb? Evaluating Performance in Self-Play of Conversational Games as a Function of Model Characteristics

Add code
Jun 20, 2024
Figure 1 for How Many Parameters Does it Take to Change a Light Bulb? Evaluating Performance in Self-Play of Conversational Games as a Function of Model Characteristics
Figure 2 for How Many Parameters Does it Take to Change a Light Bulb? Evaluating Performance in Self-Play of Conversational Games as a Function of Model Characteristics
Figure 3 for How Many Parameters Does it Take to Change a Light Bulb? Evaluating Performance in Self-Play of Conversational Games as a Function of Model Characteristics
Figure 4 for How Many Parameters Does it Take to Change a Light Bulb? Evaluating Performance in Self-Play of Conversational Games as a Function of Model Characteristics
Viaarxiv icon

Two Giraffes in a Dirt Field: Using Game Play to Investigate Situation Modelling in Large Multimodal Models

Add code
Jun 20, 2024
Viaarxiv icon

clembench-2024: A Challenging, Dynamic, Complementary, Multilingual Benchmark and Underlying Flexible Framework for LLMs as Multi-Action Agents

Add code
May 31, 2024
Figure 1 for clembench-2024: A Challenging, Dynamic, Complementary, Multilingual Benchmark and Underlying Flexible Framework for LLMs as Multi-Action Agents
Figure 2 for clembench-2024: A Challenging, Dynamic, Complementary, Multilingual Benchmark and Underlying Flexible Framework for LLMs as Multi-Action Agents
Figure 3 for clembench-2024: A Challenging, Dynamic, Complementary, Multilingual Benchmark and Underlying Flexible Framework for LLMs as Multi-Action Agents
Figure 4 for clembench-2024: A Challenging, Dynamic, Complementary, Multilingual Benchmark and Underlying Flexible Framework for LLMs as Multi-Action Agents
Viaarxiv icon

M2SA: Multimodal and Multilingual Model for Sentiment Analysis of Tweets

Add code
Apr 02, 2024
Viaarxiv icon